Code
# We load the MAE object and retrieve the clinical data
mae <- load_latest_mae(str_c(data_dir(), "03_augmented_mae/"))
clin <- MultiAssayExperiment::colData(mae) %>% as.data.frame()# We load the MAE object and retrieve the clinical data
mae <- load_latest_mae(str_c(data_dir(), "03_augmented_mae/"))
clin <- MultiAssayExperiment::colData(mae) %>% as.data.frame()target_visits <- c(4, 7)To assess whether participants with recurrent BV (or lack of Lactobacillus-dominance) have a microbiota composition similar to their initial BV composition, we primarily rely on the Bray-Curtis dissimilarity between the pre-MTZ microbiota composition expressed as ASV relative abundances (the lowest possible resolution given 16S data) and the corresponding composition at the target visit (the week 24 visit as primary target visit and the week 12 visit for a sensitivity analysis).
Participants were included in this analysis if they were diagnosed with BV or who had less than 50% Lactobacillus at the target visit and microbiota composition data was availble at the pre-MTZ visit. Analyses will distinguish between participants with or without BV at the target visit to assess whether conclusions differ based on BV diagnosis, using the same criteria for BV diagnosis as in Cohen et al, 2020.
Because some participants with diverse microbiota may have a composition similar to other participants with diverse microbiota, we will not only evaluate the similarity to their own microbiota, but also compare it to the similarity with the microbiota of other participants to compute a percentile rank. Someone with a small percentile rank is more similar to themselves than to everyone else pre-MTZ. Some participants could be relatively similar to themselves but be more similar to other participants if several participants were enrolled with similarly diverse microbiota.
From the dissimilarity matrix \(D\) where \(d_{ij}\) is the dissimilarity between participants \(i\) pre-MTZ and participant \(j\) at the target visit, we can also construct the minimum spanning tree (MST) to visualize the relationships between participants and count the number of “pure edges” (i.e., edges connecting the two visits (pre-MTZ and target visits) of the same participant). We can then test whether the observed number of pure edges is significantly higher than expected by chance using a graph permutation test.
We will evaluate whether these different metrics differ by arm to understand if the LBP (Lactin-V) disrupts the microbiota of “non-respondent” participants more than the Placebo does.
The analyses are grouped in a function recurrent_BV_composition_analysis() that takes the MAE object and the target visit as input and returns a list of results, including the dissimilarity to self and others, the MST, and the graph permutation test results.
recurrent_BV_composition_analysis
## function (mae, target_visit)
## {
## target_visit <- as.numeric(match.arg(as.character(target_visit),
## choices = as.character(target_visits), several.ok = FALSE))
## df <- select(left_join(full_join(distinct(select(filter(as_tibble(MultiAssayExperiment::colData(mae)),
## has_V0, AVISITN == target_visit), USUBJID, BV)), mutate(select(mutate(filter(get_assay_wide_format(mae,
## "prop_Lacto"), AVISITN == target_visit), prop_Lacto_target = assay$prop_Lacto),
## USUBJID, prop_Lacto_target), mBV = prop_Lacto_target <
## 0.5), by = join_by(USUBJID)), dplyr::rename(distinct(select(as_tibble(MultiAssayExperiment::colData(mae)),
## USUBJID, ARM)), Arm = ARM), by = join_by(USUBJID)), USUBJID,
## Arm, everything())
## df <- filter(df, (BV == "Yes") | mBV)
## outcome_table <- gt::gt(pivot_wider(mutate(dplyr::count(df,
## BV, mBV), BV = str_replace_na(BV, "Unknown"), mBV = ifelse(mBV,
## "< 50% Lactobacillus", "≥ 50% Lactobacillus")), names_from = mBV,
## values_from = n, values_fill = 0), caption = "Number of participants included in the analysis by BV diagnosis and Lactobacillus dominance at the week 24 visit.")
## ASV_prop <- select(filter(get_assay_wide_format(mae, "ASV_16S_filtered_p"),
## USUBJID %in% df$USUBJID, AVISITN %in% c(0, target_visit)),
## USUBJID, AVISITN, assay)
## ASV_target <- select(mutate(filter(ASV_prop, AVISITN == target_visit),
## ASV_p_target = set_rownames(as.matrix(assay), USUBJID)),
## USUBJID, ASV_p_target)
## ASV_pre_MTZ <- select(mutate(filter(ASV_prop, AVISITN ==
## 0), ASV_p_pre_MTZ = set_rownames(as.matrix(assay), USUBJID)),
## USUBJID, ASV_p_pre_MTZ)
## df <- left_join(left_join(select(df, -any_of(c("ASV_p_target",
## "ASV_p_pre_MTZ"))), ASV_target, by = "USUBJID"), ASV_pre_MTZ,
## by = "USUBJID")
## rm(ASV_prop, ASV_target, ASV_pre_MTZ)
## df <- mutate(df, BC_pre_MTZ_target = 1 - 2 * rowSums(pmin(df$ASV_p_target,
## df$ASV_p_pre_MTZ, na.rm = TRUE))/rowSums(df$ASV_p_target +
## df$ASV_p_pre_MTZ))
## BC_all_others <- ungroup(mutate(arrange(group_by(mutate(arrange(mutate(pivot_longer(rownames_to_column(as.data.frame(ValenciaR::BC(df$ASV_p_target,
## drop_na(as.data.frame(df$ASV_p_pre_MTZ)))), "USUBJID"),
## cols = -USUBJID, names_to = "pre_MTZ_USUBJID", values_to = "BC"),
## same_USUBJID = USUBJID == pre_MTZ_USUBJID, same_USUBJID_label = fct_rev(fct_infreq(ifelse(same_USUBJID,
## "Same participant", "Other participants")))), !same_USUBJID,
## BC), USUBJID = fct_inorder(USUBJID), order_BC = as.integer(USUBJID)),
## USUBJID), USUBJID, BC), has_own_BC = any(same_USUBJID),
## BC_mean = mean(BC, na.rm = TRUE), BC_sd = sd(BC, na.rm = TRUE),
## z_score = (BC - BC_mean)/BC_sd, percentile_rank = ecdf(BC)(BC) *
## 100))
## df <- left_join(select(df, -any_of("percentile_rank")), select(filter(BC_all_others,
## same_USUBJID), USUBJID, percentile_rank), by = join_by(USUBJID))
## BC_all_others <- mutate(arrange(ungroup(mutate(group_by(mutate(BC_all_others,
## is_smallest_distance = same_USUBJID & (BC == min(BC[same_USUBJID],
## na.rm = TRUE)), is_smallest_percentile_rank = same_USUBJID &
## (percentile_rank == min(percentile_rank[same_USUBJID],
## na.rm = TRUE)), is_largest_distance_within_smallest_percentile_rank = is_smallest_percentile_rank &
## (BC == max(BC[is_smallest_percentile_rank], na.rm = TRUE)),
## combo = 100 * (1 - BC) + percentile_rank, is_smallest_combo = same_USUBJID &
## (combo == min(combo[same_USUBJID], na.rm = TRUE)),
## is_largest_distance = same_USUBJID & (BC == max(BC[same_USUBJID],
## na.rm = TRUE)), is_largest_percentile_rank = same_USUBJID &
## (percentile_rank == max(percentile_rank[same_USUBJID],
## na.rm = TRUE))), USUBJID), note = case_when(any(is_smallest_distance) ~
## "smallest distance", any(is_largest_distance_within_smallest_percentile_rank) ~
## "smallest percentile rank with largest BC", any(is_smallest_combo) ~
## "smallest combo", any(is_largest_distance) ~ "largest distance",
## any(is_largest_percentile_rank) ~ "largest percentile rank",
## TRUE ~ ""))), order_BC), note = fct_relevel(fct_inorder(factor(note)),
## "", after = Inf), pid_label = str_replace(LETTERS[as.integer(note)],
## LETTERS[nlevels(note)], ""))
## df <- mutate(left_join(left_join(select(df, -any_of(c("prop_Lacto_pre_MTZ",
## "prop_Lacto_post_MTZ", "delta_prop_Lacto"))), select(mutate(filter(get_assay_wide_format(mae,
## "prop_Lacto"), AVISITN == 0), prop_Lacto_pre_MTZ = assay$prop_Lacto),
## USUBJID, prop_Lacto_pre_MTZ), by = join_by(USUBJID)),
## select(mutate(filter(get_assay_wide_format(mae, "prop_Lacto"),
## AVISITN == 1), prop_Lacto_post_MTZ = assay$prop_Lacto),
## USUBJID, prop_Lacto_post_MTZ), by = join_by(USUBJID)),
## delta_prop_Lacto = prop_Lacto_post_MTZ - prop_Lacto_pre_MTZ)
## df <- left_join(select(df, -any_of(c("prop_Lacto_max"))),
## summarize(group_by(arrange(select(mutate(filter(get_assay_wide_format(mae,
## "prop_Lacto"), AVISITN != 0, AVISITN < target_visit),
## prop_Lacto = assay$prop_Lacto), USUBJID, AVISITN,
## prop_Lacto), USUBJID, AVISITN), USUBJID), prop_Lacto_max = max(prop_Lacto,
## na.rm = TRUE), .groups = "drop"), by = join_by(USUBJID))
## df <- left_join(select(df, -any_of(c("largest_BC_by_w8"))),
## summarize(group_by(mutate(left_join(select(df, USUBJID,
## ASV_p_pre_MTZ), arrange(select(mutate(filter(get_assay_wide_format(mae,
## "ASV_16S_filtered_p"), AVISITN != 0, AVISITN < 4),
## ASV_p_other = as.matrix(assay[, colnames(df$ASV_p_pre_MTZ)])),
## USUBJID, AVISITN, ASV_p_other), USUBJID, AVISITN),
## by = join_by(USUBJID)), BC = 1 - 2 * rowSums(pmin(ASV_p_pre_MTZ,
## ASV_p_other, na.rm = TRUE))/rowSums(ASV_p_pre_MTZ +
## ASV_p_other)), USUBJID), largest_BC_by_w8 = ifelse(any(!is.na(BC)),
## max(BC, na.rm = TRUE), NA), .groups = "drop"), by = join_by(USUBJID))
## df <- left_join(select(df, -any_of(c("BC_pre_post", "BC_pre_post_abs"))),
## select(mutate(mutate(left_join(left_join(select(df, USUBJID,
## ASV_p_pre_MTZ), arrange(select(mutate(filter(get_assay_wide_format(mae,
## "ASV_16S_filtered_p"), AVISITN == 1), ASV_p_post_MTZ = as.matrix(assay[,
## colnames(df$ASV_p_pre_MTZ)])), USUBJID, AVISITN,
## ASV_p_post_MTZ), USUBJID, AVISITN), by = join_by(USUBJID)),
## select(mutate(distinct(filter(select(as_tibble(mae@colData),
## USUBJID, AVISITN, LOAD), AVISITN == 1)), post_MTZ_load = replace_na(LOAD,
## median(LOAD, na.rm = TRUE)), pre_MTZ_load = median(mae$LOAD[mae$AVISITN !=
## 1], na.rm = TRUE)), USUBJID, post_MTZ_load, pre_MTZ_load),
## by = join_by(USUBJID)), ASV_pre_MTZ_abs = ASV_p_pre_MTZ *
## pre_MTZ_load, ASV_post_MTZ_abs = ASV_p_post_MTZ *
## post_MTZ_load), BC_pre_post_abs = 1 - 2 * rowSums(pmin(ASV_pre_MTZ_abs,
## ASV_pre_MTZ_abs, na.rm = TRUE))/rowSums(ASV_pre_MTZ_abs +
## ASV_pre_MTZ_abs), BC_pre_post = 1 - 2 * rowSums(pmin(ASV_p_pre_MTZ,
## ASV_p_post_MTZ, na.rm = TRUE))/rowSums(ASV_p_pre_MTZ +
## ASV_p_post_MTZ)), USUBJID, BC_pre_post_abs, BC_pre_post),
## by = join_by(USUBJID))
## ASV_V0Vt <- mutate(filter(get_assay_wide_format(mae, "ASV_16S_filtered"),
## USUBJID %in% df$USUBJID, AVISITN %in% c(0, target_visit)),
## ID = paste(USUBJID, AVISITN, sep = "_"), )
## BC_distances <- vegan::vegdist(ASV_V0Vt$assay %>% as.matrix() %>%
## set_rownames(ASV_V0Vt$ID), method = "bray")
## ps_ASV_V0Vt <- phyloseq::phyloseq(otu_table(ASV_V0Vt$assay,
## taxa_are_rows = FALSE), sample_data(ASV_V0Vt %>% select(-assay)))
## gt <- graph_perm_test(ps_ASV_V0Vt, sampletype = "USUBJID",
## distance = "(A+B-2*J)/(A+B)", type = "mst", nperm = 1000)
## list(mae = mae, target_visit = target_visit, df = df, outcome_table = outcome_table,
## BC_all_others = BC_all_others, ASV_V0Vt = ASV_V0Vt, BC_distances = BC_distances,
## gt = gt)
## }The function make_figures_recurrent_BV_composition() generates figures to visualize the results.
make_figures_recurrent_BV_composition
## function (res)
## {
## BC_distributions <- make_distribution_figures(df = res$df,
## var = "BC_pre_MTZ_target", target_visit = res$target_visit)
## PR_distributions <- make_distribution_figures(df = res$df,
## var = "percentile_rank", target_visit = res$target_visit)
## BC_and_PR <- BC_and_PR_each_participant(BC_all_others = res$BC_all_others,
## df = res$df, target_visit = res$target_visit)
## examples <- show_examples(res = res)
## BC_and_PR_by_history <- BC_and_PR_by_history(df = res$df,
## target_visit = res$target_visit)
## MST <- plot_MST(distances = res$BC_distances, sample_data = res$ASV_V0Vt) +
## ggtitle("MST based on Bray-Curtis distance")
## main_figure <- free(BC_and_PR[[1]]) + free(BC_and_PR[[2]] +
## guides(shape = "none")) + free(examples) + BC_distributions$g_by_arm_and_BV +
## PR_distributions$g_by_arm_and_BV + plot_layout(heights = c(2,
## 1), widths = c(3, 2, 2, 2), design = "\n ABCC\n ABDE\n ") +
## plot_annotation(tag_levels = "A")
## list(BC_distributions = BC_distributions, PR_distributions = PR_distributions,
## BC_and_PR = BC_and_PR, examples = examples, BC_and_PR_by_history = BC_and_PR_by_history,
## MST = MST, main_figure = main_figure)
## }res_week_24 <- recurrent_BV_composition_analysis(mae, target_visit = 7)
figs_week_24 <- make_figures_recurrent_BV_composition(res_week_24)res_week_24$outcome_table| BV | < 50% Lactobacillus |
|---|---|
| Yes | 40 |
| No | 41 |
| Unknown | 2 |
Roughly half of participants who had <50% Lactobacillus at the week 24 visit were diagnosed with BV at that visit. All participants who were diagnosed with BV at the week 24 visit had <50% Lactobacillus at that visit.
figs_week_24$MST + plot_permutations(res_week_24$gt) +
plot_annotation(
tag_levels = "A"
) &
theme(plot.tag = element_text(size = 16, face = "bold")) figs_week_24$main_figuremake_summary_stat_table(res = res_week_24)| cat | All | LACTIN-V | Placebo |
|---|---|---|---|
| Most similar to self (min perc. rank) | 18 (22%) [14% - 33%] | 10 (22%) [11% - 37%] | 8 (23%) [11% - 41%] |
| More similar to self than to others (> min. but < 50th perc. rank) | 45 (56%) [44% - 66%] | 22 (48%) [33% - 63%] | 23 (66%) [48% - 80%] |
| More similar to others than to self (>= 50th perc. rank) | 18 (22%) [14% - 33%] | 14 (30%) [18% - 46%] | 4 (11%) [4% - 28%] |
make_summary_stat_table_v2(res = res_week_24)| cat | All | LACTIN-V | Placebo |
|---|---|---|---|
| More similar to self than to others (< 50th perc. rank) | 63 (78%) [67% - 86%] | 32 (70%) [54% - 82%] | 31 (89%) [72% - 96%] |
| More similar to others than to self (>= 50th perc. rank) | 18 (22%) [14% - 33%] | 14 (30%) [18% - 46%] | 4 (11%) [4% - 28%] |
figs_week_24$BC_distributions$patch + plot_annotation(tag_levels = "A")figs_week_24$PR_distributions$patch + plot_annotation(tag_levels = "A")figs_week_24$BC_and_PR_by_history + plot_annotation(tag_levels = "A")res_week_12 <- recurrent_BV_composition_analysis(mae, target_visit = 4)
figs_week_12 <- make_figures_recurrent_BV_composition(res_week_12)res_week_12$outcome_table| BV | ≥ 50% Lactobacillus | < 50% Lactobacillus |
|---|---|---|
| Yes | 2 | 40 |
| No | 0 | 41 |
| Unknown | 0 | 2 |
Roughly half of participants who had <50% Lactobacillus at the week 12 visit were diagnosed with BV at that visit. 2 participants who were diagnosed with BV at the week 12 visit had ≥50% Lactobacillus at that visit.
figs_week_12$MST + plot_permutations(res_week_12$gt)figs_week_12$main_figuremake_summary_stat_table(res = res_week_12)| cat | All | LACTIN-V | Placebo |
|---|---|---|---|
| Most similar to self (min perc. rank) | 21 (25%) [17% - 36%] | 14 (26%) [15% - 40%] | 7 (24%) [11% - 44%] |
| More similar to self than to others (> min. but < 50th perc. rank) | 49 (59%) [48% - 70%] | 29 (54%) [40% - 67%] | 20 (69%) [49% - 84%] |
| More similar to others than to self (>= 50th perc. rank) | 13 (16%) [9% - 26%] | 11 (20%) [11% - 34%] | 2 (7%) [1% - 24%] |
make_summary_stat_table_v2(res = res_week_12)| cat | All | LACTIN-V | Placebo |
|---|---|---|---|
| More similar to self than to others (< 50th perc. rank) | 70 (84%) [74% - 91%] | 43 (80%) [66% - 89%] | 27 (93%) [76% - 99%] |
| More similar to others than to self (>= 50th perc. rank) | 13 (16%) [9% - 26%] | 11 (20%) [11% - 34%] | 2 (7%) [1% - 24%] |
figs_week_12$BC_distributions$patchWarning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`position_quasirandom()`).
figs_week_12$PR_distributions$patchWarning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`position_quasirandom()`).
figs_week_12$BC_and_PR_by_history78% of participants who did not colonized with Lactobacillus end up with a microbiota more similar to their initial BV microbiota than to the microbiota of other participants, suggesting that the antibiotics and study products failed to significantly disrupt their microbiota. That fraction is higher in the Placebo arm (89%) than in the Lactin-V arm (70%) although confidence intervals overlap: 72% - 96% (Placebo) vs 54% - 82% (Lactin-V), and there was no evidence that participants who experienced a larger MTZ-induced shifts in their microbiota composition would be less likely to return to their initial BV microbiota.